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Abstract 

The poster presents an analytic formalism describing metric prop- 
erties of undirected random graphs with arbitrary degree distributions 
and statistically uncorrelated (i.e. randomly connected) vertices. The 
formalism allows to calculate the main network characteristics like: 
the position of the phase transition at which a giant component first 
forms, the mean component size below the phase transition, the size 
of the giant component and the average path length above the phase 
transition. Although most of the enumerated properties were previ- 
ously calculated by means of generating functions, we think that our 
derivations are conceptually much simpler. 

A poster presented at Midterm Conference COSIN - 
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1 Introduction 

Let us start with the following lemma. 
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Lemma 1 If A x , A 2 , ■ ■ ■ , A n are mutually independent events and their prob- 
abilities fulfill relations VjP(-Aj) < e then 

n n 

P(U^) = l-exp(-£P(A))-9, (1) 

i=l i=l 

may be neglected in the limit of large 

n. 

The complete proof of the Lemma is given in PJ. In the course of the 
presentation, we will take advantage of the Lemma several times. 

A random graph with a given degree distribution P(k) is the simplest 
network model |2J. In such a network the total number of vertices N is fixed. 
Degrees of all vertices are independent, identically distributed random inte- 
gers drawn from a specified distribution P(k) and there are no vertex-vertex 
correlations. Because of the lack of correlations the probability that there 
exists a walk of length x crossing index-linked vertices {i,vi,v 2 ■ ■ . f( x _i), j} 
is described by the product p ivi p VlV2 \ ivi p V2 vz\v 1 v 2 ■ ■ ■Pv (x _ 1) j\v (x _ 2) v {x _ 1) , where 

hh ■ 

gives a connection probability between vertices i and j with degrees ki and 
kj respectively, whereas 

[kj - l)kj 

Pirn - {k)N (3) 

describes the conditional probability of a link {i,j} given that there exists 
another link {l,i}. Taking advantage of the Lemma [U one can write the 
probability pfj (x) of at least one walk of length x between i and j 

N N N 

Pij(x) = 1 - ex P[- E E • • • E Pivx ■ ■ •P«(.- 1 )j>(»_ a )«( B -i)]- ( 4 ) 

«1 = 1«2=1 «(x-l)=l 

Putting J2J and Q into (jlj) and replacing the summing over nodes indexes 
by the summing over the degree distribution P{k) one gets: 



Pij( x ) = 1 - exp 



kikj {k(k - l)) 1 " 1 



(5) 
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2 Random graphs below the percolation 
threshold - the mean component size 

According to (JHJ), the probability that none among the walks of length x 
between i and j occurs is given by 



Pij{ x ) 



exp 



kikj (k{k - 1)) 
'lv (k) x 



x-l 



(6) 



and respectively the probability that there is no walk of any length between 
these vertices may be written as 



Pi 



n Pn( x ) 



n ex p 



x=l 



x=l 



exp 



kikj (k(k - 1)) 



x-l 



N 

kk • 00 
y=0 



(k)N 



(k)* 
\(k) 



(7) 



The value of p^j strongly depends on the common ratio of the geometric 
series present in the last equation. When the common ratio is greater then 
1 i.e. (k 2 ) > 2(k) random graphs are above the percolation threshold. The 
sum of the geometric series in tends to infinity and p~ = 0. Below the 
phase transition, when (k 2 ) < 2(k), the probability that the nodes i and j 
belong to separate clusters is given by 



Pi 



exp 



N (2(k}-(k 2 )) 



(8) 



and respectively the probability that i and j belong to the same cluster may 
be written as 



P 



l-p, 



i.i 



1 — exp 



i 



N (2(k) - (k 2 )) 



(9) 



Now, it is simple to calculate the mean size of the cluster that the node 
i belongs to. It is given by 



ki + 1. 



(10) 
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Figure 1: Average size of the component that a node i with degree ki belongs 
to. Scatter plots represent numerical data, whereas solid lines represent the 
prediction of Eq. (fTTHl . 



Note, that the mean size of the component that a node i belongs to, is 
proportional to degree fc, of the node (see Fig. H}. The last transformation in 
(|TTH) was obtained by taking only the first two terms of power series expansion 
of the exponential function in (JHJ). Averaging the above expression (TlT)|) over 
all nodes in the network one obtains the well-known formula [2 J for the mean 
component size in random graphs below the phase transition 



(s) 



{kf 



2(k) - (k 2 ) 



As in percolation theory [2|, the mean cluster size diverges at 

(k 2 ) = 2(k), 



(11) 



(12) 



signifying that the expression (|T2l describes the position of the percolation 
threshold in random graphs with arbitrary degree distributions 0112111. 
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3 Random graphs above the percolation 

threshold - the size of the giant component 

When (k 2 ) > 2(k) the giant component (GC) is present in the graph. The 
size of the giant component N GC scales as the size of the graph as a whole N. 
Its relative size S = N GC /N (i.e. the probability that a node belongs to GC) 
is an important quantity in percolation theory and is often identified as the 
order parameter. Here we demonstrate how to calculate the size of the giant 
component in undirected random graphs. The underlying concept, how to 
calculate S, is closely related to the method of calculating S in Cayley tree 
presented in [Zj. 

At the beginning, we deal with classical random graphs of Erdos and 
Renyi, then we generalize our derivations for the case of random graphs with 
arbitrary degree distributions and we show that our derivations are consistent 
with the formalism based on generating functions that was introduced by 
Newman et al. [2|. 

3.1 The giant component in classical random graphs 
of Erdos and Renyi 

In general terms, classical random graphs consist of a fixed number of vertices 
N, which are connected at random with a fixed probability p [5J. 

Let us call R the probability that an arbitrary node i is connected to 
the giant component through a fixed link {i,j}, where j is another arbitrary 
node. Since every node in the graph may have iV — 1 ~ iV links and all 
nodes are equivalent, the formula for R may be written as the product of 
the probability of a link and the probability that at least one of TV 

possible links emanating from j connects j to the giant component. Taking 
advantage of the Lemma [T] one can write 

R = p (1 -exp[-RN]). (13) 

This self-consistency equation for R has one or two solutions, depending 
on whether a graph is below (pN < 1) or above (pN > 1) the phase transition. 
Graphical solution of the equation (fT3|) shown at Fig. [2] presents the easiest 
way to obtain a qualitative understanding of percolation transition in classical 
random graphs. 



■5 



R 



Figure 2: Graphical solution of the equation (fT3|) . 

The probability that an arbitrary node i belongs to the giant component 
is equivalent to the probability that at least one of N possible links connects 
i to GC. Again, taking advantage of the Lemma 1. one gets 

S = l-exp[-RN]. (14) 

Comparing both relations (|T3|) and (fTi|) it is easy to see that R = pS and 
the expression (fT4*|) for the giant component in classical random graphs may 
be rewritten in the form 

S = l-exp[-(k)S], (15) 

where (k) = pN. Fig. [^presents the prediction of the Eq. (fl~5|) in comparison 
with numerically calculated sizes of the giant components in classical random 
graphs. 

It is necessary to stress that both equations (fT3|) and (fTljl are well-known 
and have been independently derived using different methods by Newman et 
al. [21 and Molloy and Reed 0. 
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3.2 The giant component in random graphs 
with arbitrary degree distributions P(k) 

In the case of classical random graphs all vertices have been considered as 
equivalent. It is not acceptable in the case of random graphs with a given 
degree distribution P(k), where each node i is characterized by its degree ki. 

Here, we call R* the probability that following arbitrary direction of a 
randomly chosen edge we arrive at the giant component. In fact, we know 
that following an arbitrary edge we arrive at a vertex i with degree ki. The 
probability that i is connected to GC is 1 — (1 — R*) ki ~ l . The notation 
expresses the probability that at least one of ki — 1 edges emanating from i 
and other than the edge we arrived along connects i to the giant component 1 . 
Now, it is simple to write the self-consistency condition for R* 

R* = ^(i-ii-RT^Qih), (16) 

where Q(ki) = kiP(ki)/(k) describes the probability that an arbitrary link 
leads to a node i with degree k{. As in the case of classical random graphs 
the equation (|T6|) can be solved graphically, signifying that the nontrivial 
solution (i.e. R* ^ 0) of the equation (|T6l) exists only for random graphs 
above the percolation threshold (A; 2 ) > 2{k). 

Knowing R*, it is simple to calculate the relative size S of the giant 
component in random graphs with arbitrary degree distribution P(k). S 
is equivalent to the probability that at least one of k links attached to an 
arbitrary node connects the node to the giant component 

S = Y,(l-(l-R*) k )P(k). (17) 

k 

It is easy to show that both equations (fT7]l and (jTB| are completely equiva- 
lent to equations derived by Newman et al. by means of generating functions 

S=l-G (v), (18) 

where v is the solution of equation given below 

v = G 1 (v). (19) 

1 We do not here take advantage of the Lemma ^ because of it works well the limit 
of large number of independent events n> 1, In the case of small n the error q of the 
Lemma can not be neglected. 
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Figure 3: The size of the giant component S versus the mean node degree 
(k) in classical random graphs of size N = 10000. The scatter plot represents 
numerical data whereas the solid line gives the solution of the Eq. (fT5|) 



We recall that Gq{x) is the generating function for the probability distribu- 
tion of vertex degrees 

G (x) = Y / P(k)x k , (20) 

k 

whereas G\(x) is given by 

Gl{x) = ^T = W)^ kP{k)xk ^- (21) 

At the beginning we show that Eq. (fT6|) is completely equivalent to Eq. 
(fT9j) . Expression (fT6|) may be transformed in the following way 

R * = 77T E kP ( k ) -7I\E kP ( k )0- - ^) fc_1 = 1 - Gi(l - iZ*), 

\ K I k \ K I k 

that exactly corresponds to Eq. (fT9|) with v — 1 — R*. Expression (fTTj) may 
be transformed into Eq. (|T8|) in a similar way, when assume that v — 1 — R*. 
Now, it is clear that the unknown parameter v in both Eqs. (JU| and JUJ) 
has the following meaning: 

v = 1 - R* (22) 
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describes the probability that an arbitrary edge in random graph does not 
belong to the giant component. 



4 Average path length in random graphs 

This part of the presentation closely follows that of Fronczak et al. [I]. 

Let us consider the situation when there exists at least one walk of the 
length x between the vertices i and j. If the walk(s) is (are) the shortest 
path(s) i and j are exactly x-th neighbors otherwise they are closer neighbors. 
In terms of statistical ensemble of random graphs the probability pfj(x) (Eq. 
(JSJ)) of at least one walk of the length x between i and j expresses also 
the probability that these nodes are neighbors of order not higher than x. 
Thus, the probability that i and j are exactly x-th neighbors is given by the 
difference 2 

= Pij(x) - ptj(x - 1). (23) 

Due to (jSJ) the probability that both vertices are exactly the x-th neigh- 
bors may be written as 

p* J (x) = F(x-l)-F(x), (24) 

where 



F(x) = exp 



(25) 



N {k) x 

Now, it is simple to calculate the average path length (APL) between i and 
j. It is given by 

oo oo 

hj{h>kj) — ^2 xp*j{x) — ^2F(x). (26) 

X=l 2=0 

Notice that a walk may cross the same node several times thus the largest 
possible walk length can be x = oo. 
The Poisson summation formula 

V F(x) = -F(0) + / F(x)dx+2J2\ F(x) cos(2mrx)dx) (27) 

x=0 2 Jo n=l / 



2 Note, that ll23)l is only true for random graphs above the percolation threshold where 
p£(x) > 0. 
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allows us to simplify l|3H|> . Firstly, let us note that in most of real networks 
(k) <C N thus we can assume 



N(k 2 ) 

that gives F(0) = 1. Secondly, we have 



(28) 



3C 




F(a;)da; = -£i -77777- / In i^-f , (29) 





where Ei(y) is the exponential integral function that for negative arguments 
is given by Ei(-y) = 7 + Iny + / 3/ (exp(-t) - l)/t dt [TT], where 7 ~ 0.5772 
is the Euler's constant. Due to (f28|) the integral in the expression for Ei{—y) 
becomes zero. Finally, every integral in the last term of the summation 
formula (|2*7jl is equal to zero owing to the generalized mean value theorem 
[12]. It follows that the equation for the APL between i and j may be written 
as 

-lnfc^ + ln((fc 2 )-(A;))+lniV-7 1 
W ' k]) ~ ln«jy»)/<Jfe) - 1) + 2" ( j 

The average intervertex distance for the whole network depends on a 
specified degree distribution P(k) 

ln((fc 2 )-(fc))-2(lnfc) + lniV-7 1 

ln((F)/(A;) - 1) 2' 1 j 

A similar result I ~ In N/ ln((/c 2 )/(/c) — 1) was obtained by Dorogovtsev 
et al. 0. The formulas (pp and (JSU diverge when (fc 2 ) = 2{k), giving the 
well-known expression for percolation threshold in undirected random graphs 



4.1 Average path length in classical random graphs 
of Erdos and Renyi 

For these networks the degree distribution is given by the Poisson function 
P{k) = e-^(k) k /k\. However, since (In k) cannot be calculated analytically 
for Poisson distribution thus the APL may not be directly obtained from (|3TJ) . 
To overcome this problem we take advantage of the mean field approximation. 
Let us assume that all vertices within a graph possess the same degree V« ki = 



10 
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as 
ft 
O 

> 
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Figure 4: The average path length Ier versus network size N in ER classical 
random graphs with (k) = pN = 4, 10, 20. Solid curves represent numerical 
prediction of Eq.jgZj). 



(k). It implies that the APL between two arbitrary nodes % and j fTTT^ should 
describe the average intervertex distance of the whole network 



lniV — 7 1 
ER ~ \n{pN) + 2' 



(32) 



Until now only a rough estimation of the quantity has been known. One 
has expected that the average shortest path length of the whole ER graph 
scales with the number of nodes in the same way as the network diameter. 
We remind that the diameter d of a graph is defined as the maximal distance 
between any pair of vertices and dsR = In NJ ln(pN) . FigB] shows the pre- 
diction of the equation (|H2"1) in comparison with the numerically calculated 
APL in classical random graphs. 



4.2 Average path length in scale- free 
Barabasi-Albert networks 

The basis of the BA model is its construction procedure. Two important 
ingredients of the procedure are: continuous network growth and preferential 
attachment. The network starts to grow from an initial cluster of m fully 
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connected vertices. Each new node that is added to the network creates m 
links that connect it to previously added nodes. The preferential attachment 
means that the probability of a new link growing out of a vertex i and ending 
up in a vertex j is given by pfA = mkj(ti)/ J2i ki(ti), where kj(U) [H] denotes 
the connectivity of a node j at the time when a new node % is added to 
the network. Taking into account the time evolution of node degrees in BA 
networks one can show that the probability pfj A is equivalent to (j2J). Now 
let us consider the conditional probability pij\u. Checking the possible time 
order of the vertices I it is easy to see that in five of 3! cases pij\u = pij 
and in a good approximation we get instead of (JHJ the result 



p£(z) = l-exp - N {k)x 



(33) 



It was found [H] that the degree distribution in BA network is given by 
P(k) = 2m 2 k~ a , where k — m, m + 1, . . . , m\fN \ and the scaling exponent 
a = 3. Putting (k) = 2m, (k 2 ) = m 2 \iaN and taking into account (|33|) one 
gets that the APL between i and j is given by 



BA -ln(fc i fc 3 -)+lniV + ln(2m)-7 3 

ij { h j) lnlniV + ln(m/2) + 2 



Averaging (j3~H) over all vertices we obtain 

lniV-ln(m/2)-l-7 3 
iBA ~ lnlniV + ln(m/2) + 2 [6b) 

FigJHl shows the APL of BA networks as a function of the network size iV 
compared with the analytical formula (|35j) . There is a visible discrepancy 
between the theory and numerical results when (k) = 4. The discrepancy 
disappears when the network becomes denser i.e. when (k) increases. 



4.3 Average path length in scale-free networks 
with arbitrary scaling exponent 

Let us consider scale-free random graphs with degree distribution given by a 
power law, i.e. P a (k) = (a — l)m a ~ l k~ a , where k — m, m + 1, . . . , miV 1 ^ 0-1 ' 
[10J. Taking advantage of (|3*T| we get that for large networks N ^> 1 the 
APL scales as follows 
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Figure 5: Characteristic path length Iba versus network size N in BA net- 
works. Solid lines represent Eq. (pj5|) - 

• I ~ 2/(3 - a) + 1/2 for 2 < a < 3, 

• / ~ lniV/lnrniV + 3/2 for a = 3, 

• I ~ In JV/(ln(m(a - 2)/(a - 3) - 1) + 1/2 for a > 3. 

The result for a > 3 is consistent with estimations obtained by Cohen and 
Havlin (TH]. The first case with / independent on N shows that there is a 
saturation effect for the mean path length in large scale-free networks with 
scaling exponent from the range 2 < a < 3. Our derivations show that the 
behaviour of APL within scale-free networks is even more intriguring than 
reported by Cohen and Havlin [10J. 
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